K nearest neighbours with mutual information for simultaneous classification and missing data imputation

نویسندگان

  • Pedro J. García-Laencina
  • José-Luis Sancho-Gómez
  • Aníbal R. Figueiras-Vidal
  • Michel Verleysen
چکیده

Missing data is a common drawback in many real-life pattern classification scenarios. One of the most popular solutions is missing data imputation by the K nearest neighbours ðKNNÞ algorithm. In this article, we propose a novel KNN imputation procedure using a feature-weighted distance metric based on mutual information (MI). This method provides a missing data estimation aimed at solving the classification task, i.e., it provides an imputed dataset which is directed toward improving the classification performance. The MI-based distance metric is also used to implement an effective KNN classifier. Experimental results on both artificial and real classification datasets are provided to illustrate the efficiency and the robustness of the proposed algorithm. & 2009 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

K-nearest neighbours based on mutual information for incomplete data classification

Incomplete data is a common drawback that machine learning techniques need to deal with when solving real-life classification tasks. One of the most popular procedures for solving this kind of problems is the K-nearest neighbours (KNN) algorithm. In this paper, we present a weighted KNN approach using mutual information to impute and classify incomplete input data. Numerical results on both art...

متن کامل

Improving Classification Accuracy Using Missing Data Filling Algorithms for the Criminal Dataset

Predicting crime types by using classification algorithms can help to find factors affecting crimes and prevent crimes. Due to various reasons in the process of data collection, there are often a large number of missing values in actual criminal dataset, which seriously affects the classification accuracy. Therefore, based on mutual KNNI (K nearest neighbor imputation) algorithm and combined wi...

متن کامل

P. Jönsson and C. Wohlin, "benchmarking K-nearest Neighbour Imputation with Homogeneous Likert Data", Empirical Software Engineering: an Benchmarking K-nearest Neighbour Imputation with Homogeneous Likert Data

Missing data are common in surveys regardless of research field, undermining statistical analyses and biasing results. One solution is to use an imputation method, which recovers missing data by estimating replacement values. Previously, we have evaluated the hot-deck k-Nearest Neighbour (kNN) method with Likert data in a software engineering context. In this paper, we extend the evaluation by ...

متن کامل

Feature selection with missing data using mutual information estimators

Feature selection is an important preprocessing task for many machine learning and pattern recognition applications, including regression and classification. Missing data are encountered in many real-world problems and have to be considered in practice. This paper addresses the problem of feature selection in prediction problems where some occurrences of features are missing. To this end, the w...

متن کامل

تحلیل مشاهدات گمشده در مطالعه اثر دوزهای مختلف مکمل ویتامین D بر مقاومت به انسولین در دوران بارداری

Introduction: The aim  of  this  study  was to impute missing data  and  to compare the effect  of  different doses of  vitamin D supplementation on  insulin resistance during  pregnancy. Methods: A clinical trial  study   was done on 104  women  with diabetes and gestational age less than 12 weeks between 1391 and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neurocomputing

دوره 72  شماره 

صفحات  -

تاریخ انتشار 2009